Simulate and plot some data from simple ARIMA models. a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).
ar1 <- function(phi, n = 100L) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
- Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
Some examples of changing \(\phi_1\)
ar1(0.6)
## # A tsibble: 100 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 0.471
## 3 3 0.0263
## 4 4 0.488
## 5 5 1.16
## 6 6 0.870
## 7 7 1.86
## 8 8 1.19
## 9 9 0.637
## 10 10 -0.267
## # … with 90 more rows
ar1(0.6) %>% autoplot(y) + labs(title=expression(paste(phi, "=0.6")))
ar1(0.95) %>% autoplot(y) + labs(title=expression(paste(phi, "=0.95")))
ar1(0.05) %>% autoplot(y) + labs(title=expression(paste(phi, "=0.05")))
ar1(-0.65) %>% autoplot(y) + labs(title=expression(paste(phi, "=-0.65")))
- Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\).
ma1 <- function(theta, n = 100L) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- theta * e[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
- Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
ma1(0.6) %>% autoplot(y) + labs(title=expression(paste(theta, "=0.6")))
ma1(0.95) %>% autoplot(y) + labs(title=expression(paste(theta, "=0.95")))
ma1(0.05) %>% autoplot(y) + labs(title=expression(paste(theta, "=0.05")))
ma1(-0.65) %>% autoplot(y) + labs(title=expression(paste(theta, "=-0.65")))
- Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_{1} = 0.6\) and \(\sigma^2=1\).
arma11 <- function(phi, theta, n = 100) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + theta * e[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
arma11(0.6, 0.6) %>% autoplot(y)
- Generate data from an AR(2) model with \(\phi_{1} =-0.8\), \(\phi_{2} = 0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
ar2 <- function(phi1, phi2, n = 100) {
y <- numeric(n)
e <- rnorm(n)
for (i in 3:n) {
y[i] <- phi1 * y[i - 1] + phi2 * y[i - 2] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
ar2(-0.8, 0.3) %>% autoplot(y)
- Graph the latter two series and compare them.
See graphs above. The non-stationarity of the AR(2) process has led to increasing oscillations
Consider
aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
- Use
ARIMA()to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
aus_airpassengers %>% autoplot(Passengers)
fit <- aus_airpassengers %>%
model(arima = ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit %>% gg_tsresiduals()
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers)
- Write the model in terms of the backshift operator.
## year
## 4.307764
\[(1-B)^2y_t = (1+\theta B)\varepsilon_t\] where \(\varepsilon\sim\text{N}(0,\sigma^2)\), \(\theta = -0.90\) and \(\sigma^2 = 4.31\).
- Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
aus_airpassengers %>%
model(arima = ARIMA(Passengers ~ 1 + pdq(0,1,0))) %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
- Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to part b. Remove the constant and see what happens.
aus_airpassengers %>%
model(arima = ARIMA(Passengers ~ 1 + pdq(2,1,2))) %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
aus_airpassengers %>%
model(arima = ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## # A mable: 1 x 1
## arima
## <model>
## 1 <NULL model>
- Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
aus_airpassengers %>%
model(arima = ARIMA(Passengers ~ 1 + pdq(0,2,1))) %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
The forecast trend is now quadratic, and there is a warning that this is generally a bad idea.
For the United States GDP series (from
global_economy):
- If necessary, find a suitable Box-Cox transformation for the data;
us_economy <- global_economy %>%
filter(Code == "USA")
us_economy %>%
autoplot(GDP)
lambda <- us_economy %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.2819443
us_economy %>%
autoplot(box_cox(GDP, lambda))
It seems that a Box-Cox transformation may be useful here.
- fit a suitable ARIMA model to the transformed data using
ARIMA();
fit <- us_economy %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
- try some other plausible models by experimenting with the orders chosen;
fit <- us_economy %>%
model(
arima010 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 0)),
arima011 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 1)),
arima012 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 2)),
arima013 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 3)),
arima110 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)),
arima111 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 1)),
arima112 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 2)),
arima113 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 3)),
arima210 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 0)),
arima211 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 1)),
arima212 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 2)),
arima213 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 3)),
arima310 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 0)),
arima311 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 1)),
arima312 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 2)),
arima313 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 3))
)
- choose what you think is the best model and check the residual diagnostics;
fit %>%
glance() %>%
arrange(AICc) %>%
select(.model, AICc)
## # A tibble: 16 × 2
## .model AICc
## <chr> <dbl>
## 1 arima110 657.
## 2 arima011 659.
## 3 arima111 659.
## 4 arima210 659.
## 5 arima012 660.
## 6 arima112 661.
## 7 arima211 661.
## 8 arima310 662.
## 9 arima013 662.
## 10 arima312 663.
## 11 arima311 664.
## 12 arima113 664.
## 13 arima212 664.
## 14 arima313 665.
## 15 arima213 666.
## 16 arima010 668.
The best according to the AICc values is the ARIMA(1,1,0) w/ drift model.
best_fit <- us_economy %>%
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)))
best_fit %>% report()
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
best_fit %>% gg_tsresiduals()
augment(best_fit) %>% features(.innov, ljung_box, dof = 2, lag = 10)
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)) 3.81 0.874
The residuals pass the Ljung-Box test, but the histogram looks like negatively skewed.
- produce forecasts of your fitted model. Do the forecasts look reasonable?
best_fit %>%
forecast(h = 10) %>%
autoplot(us_economy)
These look reasonable. Let’s compare a model with no transformation.
fit1 <- us_economy %>% model(ARIMA(GDP))
fit1 %>%
forecast(h = 10) %>%
autoplot(us_economy)
Notice the effect of the transformation on the forecasts. Increase the forecast horizon to see what happens. Notice also the width of the prediction intervals.
us_economy %>%
model(
ARIMA(GDP),
ARIMA(box_cox(GDP, lambda))
) %>%
forecast(h = 100) %>%
autoplot(us_economy)
- compare the results with what you would obtain using
ETS()(with no transformation).
us_economy %>%
model(ETS(GDP)) %>%
forecast(h = 10) %>%
autoplot(us_economy)
The point forecasts are similar, however the ETS forecast intervals are much wider.
Consider the number of Snowshoe Hare furs traded by the Hudson Bay Company between 1845 and 1935 (data set
pelt).
- Produce a time plot of the time series.
pelt %>% autoplot(Hare)
- Assume you decide to fit the following model: \[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \phi_3 y_{t-3} + \phi_4 y_{t-4} + \varepsilon_t, \] where \(\varepsilon_t\) is a white noise series. What sort of ARIMA model is this (i.e., what are \(p\), \(d\), and \(q\))?
- By examining the ACF and PACF of the data, explain why this model is appropriate.
pelt %>% gg_tsdisplay(plot="partial")
## Plot variable not specified, automatically selected `y = Hare`
fit <- pelt %>% model(AR4 = ARIMA(Hare ~ pdq(4,0,0)))
fit %>% gg_tsresiduals()
- The last five values of the series are given below:
| Year | 1931 | 1932 | 1933 | 1934 | 1935 |
|---|---|---|---|---|---|
| Number of hare pelts | 19520 | 82110 | 89760 | 81660 | 15760 |
The estimated parameters are \(c = 30993\), \(\phi_1 = 0.82\), \(\phi_2 = -0.29\), \(\phi_3 = -0.01\), and \(\phi_4 = -0.22\). Without using the
forecastfunction, calculate forecasts for the next three years (1936–1939).
\[\begin{align*} \hat{y}_{T+1|T} & = 30993 + 0.82* 15760 -0.29* 81660 -0.01* 89760 -0.22* 82110 = 2051.57 \\ \hat{y}_{T+2|T} & = 30993 + 0.82* 2051.57 -0.29* 15760 -0.01* 81660 -0.22* 89760 = 8223.14 \\ \hat{y}_{T+3|T} & = 30993 + 0.82* 8223.14 -0.29* 2051.57 -0.01* 15760 -0.22* 81660 = 19387.96 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts using
forecast. How are they different from yours? Why?
pelt %>%
model(ARIMA(Hare ~ pdq(4, 0, 0))) %>%
forecast(h=3)
## # A fable: 3 x 4 [1Y]
## # Key: .model [1]
## .model Year Hare .mean
## <chr> <dbl> <dist> <dbl>
## 1 ARIMA(Hare ~ pdq(4, 0, 0)) 1936 N(2052, 5.9e+08) 2052.
## 2 ARIMA(Hare ~ pdq(4, 0, 0)) 1937 N(8223, 9.8e+08) 8223.
## 3 ARIMA(Hare ~ pdq(4, 0, 0)) 1938 N(19388, 1.1e+09) 19388.
Any differences will be due to rounding errors.
The population of Switzerland from 1960 to 2017 is in data set
global_economy.
- Produce a time plot of the data.
swiss_pop <- global_economy %>%
filter(Country == "Switzerland") %>%
select(Year, Population) %>%
mutate(Population = Population / 1e6)
autoplot(swiss_pop, Population)
- You decide to fit the following model to the series: \[y_t = c + y_{t-1} + \phi_1 (y_{t-1} - y_{t-2}) + \phi_2 (y_{t-2} - y_{t-3}) + \phi_3( y_{t-3} - y_{t-4}) + \varepsilon_t\] where \(y_t\) is the Population in year \(t\) and \(\varepsilon_t\) is a white noise series. What sort of ARIMA model is this (i.e., what are \(p\), \(d\), and \(q\))?
This is an ARIMA(3,1,0), hence \(p=3\), \(d=1\) and \(q=0\).
- Explain why this model was chosen using the ACF and PACF of the differenced series.
swiss_pop %>% gg_tsdisplay(Population, plot="partial")
swiss_pop %>% gg_tsdisplay(difference(Population), plot="partial")
The significant spike at lag 3 in the PACF, coupled with the exponential decay in the ACF, for the differenced series, signals an AR(3) for the differenced series.
- The last five values of the series are given below.
| Year | 2013 | 2014 | 2015 | 2016 | 2017 |
|---|---|---|---|---|---|
| Population (millions) | 8.09 | 8.19 | 8.28 | 8.37 | 8.47 |
The estimated parameters are \(c = 0.0053\), \(\phi_1 = 1.64\), \(\phi_2 = -1.17\), and \(\phi_3 = 0.45\). Without using the
forecastfunction, calculate forecasts for the next three years (2018–2020).
\[\begin{align*} \hat{y}_{T+1|T} & = 0.0053 + 8.47+ 1.64* (8.47 - 8.37) -1.17* (8.37 - 8.28) + 0.45* (8.28 - 8.19) = 8.56 \\ \hat{y}_{T+2|T} & = 0.0053 + 8.56+ 1.64* (8.56 - 8.47) -1.17* (8.47 - 8.37) + 0.45* (8.37 - 8.28) = 8.65 \\ \hat{y}_{T+3|T} & = 0.0053 + 8.65+ 1.64* (8.65 - 8.56) -1.17* (8.56 - 8.47) + 0.45* (8.47 - 8.37) = 8.73 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
global_economy %>%
filter(Country == "Switzerland") %>%
mutate(Population = Population / 1e6) %>%
model(ARIMA(Population ~ 1 + pdq(3, 1, 0))) %>%
forecast(h=3)
## # A fable: 3 x 5 [1Y]
## # Key: Country, .model [1]
## Country .model Year Population .mean
## <fct> <chr> <dbl> <dist> <dbl>
## 1 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2018 N(8.6, 0.00013) 8.56
## 2 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2019 N(8.6, 0.001) 8.65
## 3 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2020 N(8.7, 0.0033) 8.73
Any differences will be due to rounding errors.
- Select a time series from Quandl. Then copy its short URL and import the data.
y <- Quandl::Quandl("ODA/PBEVE_INDEX") %>%
mutate(Month = yearmonth(Date)) %>%
as_tsibble(index=Month)
- Plot graphs of the data, and try to identify an appropriate ARIMA model.
y %>% gg_tsdisplay(Value, plot_type="partial")
y %>% gg_tsdisplay(difference(Value), plot_type="partial")
Looks like an ARIMA(5,1,0) or an ARIMA(0,1,5)
fit <- y %>%
model(
arima510 = ARIMA(Value ~ pdq(5,1,0) + PDQ(0,0,0)),
arima015 = ARIMA(Value ~ pdq(0,1,5) + PDQ(0,0,0)),
auto = ARIMA(Value)
)
glance(fit) %>% arrange(AICc)
## # A tibble: 3 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima015 16.7 -1456. 2924. 2924. 2949. <cpl [0]> <cpl [5]>
## 2 arima510 16.8 -1458. 2929. 2929. 2954. <cpl [5]> <cpl [0]>
## 3 auto 17.0 -1460. 2933. 2933. 2963. <cpl [26]> <cpl [2]>
My guessed ARIMA(0,1,5) is best.
- Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?
fit %>%
select(arima015) %>%
gg_tsresiduals()
No obvious problems.
- Use your chosen ARIMA model to forecast the next four years.
fit %>%
select(arima015) %>%
forecast(h = "4 years") %>%
autoplot(y)
- Now try to identify an appropriate ETS model.
y %>%
model(ets = ETS(Value))
## # A mable: 1 x 1
## ets
## <model>
## 1 <ETS(M,Ad,N)>
Automatic selection gives a damped additive trend.
- Do residual diagnostic checking of your ETS model. Are the residuals white noise?
y %>%
model(ets = ETS(Value)) %>%
gg_tsresiduals()
There is a large amount of autocorrelation at lag 5.
- Use your chosen ETS model to forecast the next four years.
y %>%
model(ets = ETS(Value)) %>%
forecast(h = "4 years") %>%
autoplot(y)
The prediction intervals are much wider than the ARIMA model.
- Which of the two models do you prefer?
I prefer the ARIMA model because it better captures the autocorrelations in the data, and has narrower prediction intervals.